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Abstract 

Automated fault detection is an increasingly important problem in 
aircraft maintenance and operation. Standard methods of fault detection 
assume the availability of either data produced during all possible faulty 
operation modes or a clearly-defined means to determine whether the data 
provide a reasonable match to known examples of proper operation. In 
the domain of fault detection in aircraft, identifying all possible faulty 
and proper operating modes is clearly impossible. We envision a system 
for online fault detection in aircraft, one part of which is a classifier that 
predicts the maneuver being performed by the aircraft as a function of vi- 
bration data and other available data. To develop such a system, we use 
flight data collected under a controlled test environment, subject to many 
sources of variability. We explain where our classifier fits into the envi- 
sioned fault detection system as well as experiments showing the promise 
of this classification subsystem. 


1 Introduction 

A critical aspect of the operation and maintenance of aircraft is detecting prob- 
lems in their operation when they occur in flight. This allows maintenance and 
flight crews to fix problems before they become severe and lead to significant 
aircraft damage or even a crash. Fault detection systems designed for this pur- 
pose are becoming a standard requirement in most aircraft [4, 10]. However, 
most systems produce too many false alarms, mainly due to an inability to com- 
pare real behavior with modeled behavior, making their reliability questionable 
in practice [9]. Other systems require a clearly-defined means to determine 
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Table 1: Conceptual open loop model illustrating assumed causal relationships. 



Figure 1: Online Fault Detection System Block Diagram. 

whether the data provide a reasonable match to known examples of proper op- 
eration or assume the availability of data produced during all possible faulty 
operation modes [4, 5, 10]. Because of the highly safety-critical nature of the 
aircraft domain application, most fault detection systems must function well 
even though fault data are not available and the set of possible faults is un- 
known. Models are typically used to predict the effect of damage and failures 
on otherwise healthy (baseline) data [6, 9]. However, while models are a nec- 
essary first start, the modeled system response often does not take operational 
variability into account, resulting in high false-alarm rates. Novelty detection 
is one approach to overcoming this problem, addressing the problem of model- 
ing the proper operation of a system and detecting when its operation deviates 
significantly from normal operation [5, 7]. 

In this paper, we present an approach to novelty detection for in-flight air- 
craft data. The data were collected as part of a research effort to understand the 
sources of variability present in the actual flight environment, with the purpose 
of reducing the high rates of false alarms [6, 12]. In past work, we have de- 
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scribed aircraft operation conceptually according to the open-loop causal model 
shown in table 1. We assume that the maneuver being performed (M) influences 
the observable aircraft attitudes (A), which in turn influence the set of possibly 
observable physical inputs (I) to the transmission. The physical inputs influ- 
ence the transmission in a variety of ways that are not typically observable (R) . 
However, there are outputs that can be observed (0). Our approach to fault 
detection in aircraft depends fundamentally on the assumption that the nature 
of the relationships between the elements M, A, I, R, and 0 described above 
change when a fault materializes. Many approaches to fault detection attempt 
to model only the set of possible outputs (0) and indicate the presence of a 
fault when the actual outputs do not match the model. However, this approach 
is difficult because the output space is often too complicated to allow faithful 
modeling and measuring differences between the modeled and actual outputs. 
This latter difficulty remains even if one attempts to model the output as a 
function of something that influences it such as the physical inputs or the flight 
maneuver. Approaches to fault diagnosis (e.g., [17]) attempt to predict either 
normal operation or one of a designated set of faults. As stated earlier, this is 
not possible in the aircraft domain because the set of possible faults is unknown 
and fault data are non-existent. For this reason, we envision a fault detection 
system containing a classifier that models the flight maneuver (M) as a function 
of the outputs (0). This allows us to measure differences between modeled and 
actual operation in the space of flight maneuvers, which is a much simpler space 
than the space of vibration signals (0). We would like to harness this fact in 
our system. 

Figure 1 is a block diagram of the system that we envision for online fault 
detection. A fundamental idea is the use of multiple sources of information to 
predict aspects of the state of the system being modeled, such as the maneuver 
being performed and predicting faults when the system state predictions are 
incompatible. In this paper, we present several maneuver classifiers, which are 
depicted in figure 1 in the top two blocks marked “Maneuver Classifier.” These 
classifiers take vibration data from various accelerometers and/or other available 
data as input and predict the maneuver being performed. Multiple classifiers 
that predict the maneuver may be present in the fault detection system. Mod- 
els of aircraft operation that generate predictions of vibration signatures, which 
are in the mold of traditional fault detection systems that we described earlier, 
may also be included in this system (the lowest box marked “System Model”). 
The goal of this work is to develop a fault detection system which compares the 
maneuver predictions from the various maneuver classifiers and uses other ap- 
propriate data to diagnose whether a fault is present based on these predictions. 
For example, if a vibration data-based classifier predicts that the helicopter is 
flying forward at high speed, but other data and/or subsystems indicate that 
the aircraft is on the ground, then the probability that a fault is present is high. 
Additionally, our fault detection system can use physical constraints. For exam- 
ple, if the predicted maneuver fluctuates more rapidly than what is physically 
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Table 2: Flight Protocol for Each Phase of Experiment. 
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possible, then we hypothesize that the probability that a fault is present is high. 

In the remainder of this paper, section 2 discusses the two aircraft studied 
and the data generated from them. We discuss the machine learning methods 
that we used and the data preparation that we performed in order to use these 
methods in section 3. We discuss the experimental results in section 4. We 
summarize the results of this paper and discuss ongoing and future work in 
section 5. 

2 Aircraft Data 

The data used in this work were collected from two helicopters: an AH1 Cobra 
and OH58c Kiowa [6] . The data were collected by having two pilots each fly two 
designated sequences of steady-state maneuvers according to a predetermined 
test matrix (table 2). It uses a modified Latin-square design to counterbalance 
changes in wind conditions, ambient temperature, and fuel depletion. Each of 
the four flights consisted of an initial period on the ground with the helicopter 
blades at flat pitch, followed by a low hover, a sequence of maneuvers drawn 
from the 12 primary maneuvers, a low hover, and finally a return to ground 
(the list of maneuvers is shown in table 3). Each maneuver was scheduled to 
last 34 seconds in order to allow a sufficient number of cycles of the main rotor 
and planetary gear assembly to apply the signal decomposition techniques used 
in the previous studies [6]. 

Summary matrices were created from the raw data by averaging the data 
produced during each revolution of the planetary gear. The summarized data 
consists of 31475 revolutions of data for the AH1 and 34144 revolutions of data 
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Table 3: Aircraft Maneuvers for Phases 1 and 2. 


1 Maneuver Name 

Symbol 

Description 

A 

Forward Flight, Low Speed 

FFLS 

Fly straight, level, &: forward at 20 kts. 

B 

Forward Flight, High Speed 

FFHS 

Fly straight, level, &: forward at 60 kts. 

C 

Sideward Flight Left 

SL 

Fly straight, level, & sideward left. 

D 

Sideward Flight Right 

SR 

Fly straight, level, & sideward right. 

E 

Forward Climb, Low Power 

FCLP 

Fly forward, straight, h climb at 40 psi. 

F 

Forward Descent, Low Power 

FDLP 

Fly forward, straight, &; descend at 10 psi. 

G 

Flat Pitch on Ground 

G 

Vehicle on ground skids. 

H 

Hover 

H 

Stationary hover. 

I 

Hover Turn Left 

HTL 

Level hover, turning left. 

J 

Hover Turn Right 

HTR 

Level hover, turning right. 

K 

Coordinated Turn Left 

CTL 

Fly level, forward, & turning left. 

L 

Coordinated Turn Right 

CTR 

Fly level, forward, & turning right. 

M 

Forward Climb, High Power 

FCHP 

Fly forward, straight, & climb at 50 psi. 

N 

Forward Descent, High Power 

FDHP 

Fly forward, straight, & descend at 50 psi. 


for the OH58c. Each row, representing one revolution, indicates the maneuver 
being performed during that revolution as well as the following 30 quantities: 
Revolutions per minute of the planetary gear, torque (mean, standard deviation, 
skew, and kurtosis), and vibration data from six accelerometers (root-mean- 
square, skew, kurtosis, and a binary variable indicating whether signal clipping 
occurred). For the AH1, the mean and standard deviations were available for 
the following attitude data from a 1553 bus: altitude, speed, rate of climb, 
heading, bank angle, pitch, and slip. 

3 Methods 

Sample torque and RPM data from one maneuver separated by pilot and by 
flights are shown in figures 2 and 3, respectively. The highly- variable nature of 
the data, as well as differences due to different pilots and different days when 
the aircraft were flown, are clearly visible and make this a challenging classifi- 
cation problem. To perform the necessary mapping for this problem, we chose 
multilayer perceptrons (MLPs) with one hidden layer and radial basis function 
(RBF) networks as base classifiers. Furthermore, we constructed ensembles of 
each type of classifier, as well as ensembles consisting of half MLPs and half RBF 
networks, because ensembles have been shown to improve upon the performance 
of their constituent or base classifiers, particularly when the correlations among 
them can be kept low [2, 18]. We now explain these methods in more detail. 

3.1 Multilayer Perceptrons 

The multilayer perceptron (MLP) is the most common neural network repre- 
sentation (see [1] for a more detailed explanation). It is often depicted as a 
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Figure 2: OH58c Maneuver A (Forward Figure 3: OH58c Maneuver A (Forward 
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directed graph consisting of nodes and arcs — an example is shown in figure 4. 
Each column of nodes is a layer. The leftmost layer is the input layer . The 
inputs of an example to be classified are entered here. The second and third 
layers are the hidden layer and the output layer , respectively. Information flows 
from the input layer to the hidden layer and then to the output layer via a 
set of arcs. The nodes within a layer are not connected to each other. In our 
example, every node in one layer is connected to every node in the next layer, 
but this is not required in general. Also, an MLP can have more or less than 
one hidden layer and can have any number of nodes in each hidden layer. One 
hidden layer is considered standard for applications in order to avoid overfitting 
(fitting the training data so precisely that the network captures artifacts specific 
to that training set and therefore performs poorly on new data). The number 
of hidden units is normally chosen in the manner we did, which is by training 
with different numbers of hidden units and choosing the number that yields the 
highest performance on a separate dataset not used for training. 

Each non-input node, its incoming arcs, and its single outgoing arc constitute 
a neuron , which is the basic computational element of an MLP. Each incoming 
arc multiplies the value coming from its origin node by the weight assigned to 
that arc and sends the result to the destination node. The destination node 
adds the values presented to it by all the incoming arcs, transforms it with 
a nonlinear activation function (e.g., a sigmoid function), and then sends the 
result along the outgoing arc. For example, the value returned by a hidden node 
Zj in our example MLP is 



where A is the number of input nodes, is the weight on the arc in the kth 
layer of arcs that goes from unit i in the kth layer of nodes to unit j in the next 
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Figure 4: An example of a feedforward multilayer perceptron. 


layer (so is the weight on the arc that goes from input unit i to hidden unit 
j), g is a nonlinear activation function, and \A\ is the number of input nodes. 
A commonly used activation function is the sigmoid function: 

“14* exp(-a) ' 

The value returned by an output node yj is 

yj = 3 w Z )z ^j 

where Z is the number of hidden units. The outputs are clearly nonlinear func- 
tions of the inputs. MLPs used for classification problems typically have one 
output per class. The example MLP depicted in figure 4 is of this type. The 
outputs lie in the range [0, 1]. Each output value is a measure of the network’s 
confidence that the example presented to it is a member of that output’s corre- 
sponding class. Therefore, the class corresponding to the highest output value 
is returned as the prediction. 

MLP learning performs nonlinear regression given a training set. The most 
widely used method for setting the weights in an MLP is the backpropagation 
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algorithm [3, 11]. For each example in the training set, its inputs are presented 
to the input layer of the network and the predicted outputs are calculated. 
Then the difference between each predicted output and the corresponding target 
output is calculated. The gradient of this error with respect to the weights on 
the network’s arcs is also calculated. The weights are then adjusted according to 
this gradient so that if the training example is presented to the network again, 
then the error would be less. The learning algorithm typically cycles through 
the training set many times — each cycle is called an epoch in the neural network 
literature. 

For both data sets, we determined the number of hidden units experimen- 
tally. For MLPs, we explored hidden layer sizes ranging from 5 to 100 in in- 
crements of 5, and settled on 25 hidden units for the AH1 and 65 units for 
the OH58c. We used a learning rate and momentum term of 0.2, and trained 
for 100 epochs. The performances of the MLPs were fairly insensitive to the 
number of hidden units and kernels and the learning parameters. We created 
280 MLPs for each helicopter — each MLP was given a different set of random 
initial weights before training, but were trained using the same training sets. 

Past work [8] has asserted that MLPs are not as well-suited to fault detection 
as distance-based classifiers because real data often falls outside the range of 
the training data, requiring the MLP to extrapolate, which it does not do as 
well as distance-based classifiers. The truth is not so simple. In particular, in 
order to use a distance-based classifier, one has to choose a distance function 
that properly weighs all the attributes relative to each other. Choosing such 
a distance function is very difficult. Also, the approach in [8] uses a nearest- 
neighbor classifier to classify examples as coming from normal operation or one 
of a designated set of faulty operation modes. As discussed earlier, this approach 
is not applicable to our aircraft fault detection domain because the set of possible 
faults is unknown and fault data is nonexistent. Also, as asserted in [8], nearest- 
neighbor classifiers do not require significant training time because training 
consists of merely storing the training examples. This is clearly impractical 
because of the amount of training data collected. 


3.2 Radial Basis Function Networks 

As its name implies, radial basis function (RBF) networks (see [1] for a more 
detailed explanation) create a set of spherical basis functions from the training 
set, and then fit the outputs using these basis functions. In particular, the fitted 
function is of the form 


j 

Vk{x) = J2 w k,j<Pj{x) 

3 = 1 
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where the J basis functions <f>j(x) (j € {1, 2, . . . , J}) are 


= exp ) 

and k indexes over the outputs — just as with MLPs, RBF networks used for 
classification typically have as many outputs as possible classes. The functions 
(f>j ( x ) are Gaussian radial basis functions with means {Xj and variances cr 2 . These 
parameters are determined in the first stage of RBF network learning using an 
algorithm that finds the means and widths of clusters of data points in the 
input space (the space of x values). Therefore, the outputs (y’s) are not used in 
the first stage of learning. The cluster means and widths correspond to means 
and variances of the radial basis functions. The second stage determines the 
parameters Wkj- These parameters represent different possible output values. 
The predicted output for a new input is calculated as a linear combination of 
the Wkj ' s weighted by (f)j (x) , which measures how far the new example is from 
the center of the j’th RBF. 

For the RBF networks, we used 100 centers for the OH58c data and deter- 
mined each kernel’s center and width using the nearest 300 patterns. 1 For the 
AH1 data, we used 55 kernels with the centers and widths determined by the 
nearest 500 patterns. For each helicopter, we created 100 RBF networks, each 
of which had a different set of centers. 2 

3.3 Ensembles 

Ensembles are combinations of multiple base models, each of which may be a 
traditional machine learning model such as an MLP or RBF network. When a 
new example is to be classified, it is presented to the ensemble’s base models 
and their outputs are combined in some manner (e.g., voting or averaging) to 
yield the ensemble’s prediction. Intuitively, we would like to have base models 
that perform well and do not make highly-correlated errors. We can see the 
intuition behind this point graphically in figure 5. The goal of the learning 
problem depicted in the figure is to separate the positive examples (’+’) from 
the negative examples (’-’). The figure depicts an ensemble of three linear 
classifiers. For example, classifier C classifies examples above it as negative 
examples and examples below it as positive examples. Note that none of the 
three lines separates the positive and negative examples perfectly. For example, 
classifier C misclassifies all the positive examples in the top half of the figure. 
Indeed, no linear classifier can separate the positive examples from the negative 

x That is, for each center, the 300 training cases closest to it in Euclidian distance were 
used to determine its radius. Therefore, the radius increases with the number of neighboring 
training cases used. 

2 Due to the large computation time needed to obtain the centers and widths of the kernels 
on such large data sets, we only used 100 RBF networks as opposed to 280 MLPs. 
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Figure 5: An ensemble of linear classifiers. Each line A, B, and C is a linear 
classifier. The boldface line is the ensemble that classifies new examples by 
returning the majority vote of A, B, and C. 


examples. However, the ensemble of three lines, where each line gets one vote, 
correctly classifies all the examples — for every example, at least two of the three 
linear classifiers correctly classifies it, so the majority is always correct. This is 
the result of having three very different linear classifiers in the ensemble. This 
example clearly depicts the need to have base models whose errors are not highly 
correlated. If all the linear classifiers make mistakes on the same examples (for 
example if the ensemble consisted of three copies of line A), then a majority 
vote over the lines would also make mistakes on the same examples, yielding no 
performance improvement. 

The intuition that we have just described has been formalized [14, 16]. En- 
semble learning can be justified in terms of the bias and variance of the learned 
model. It has been shown that, as the correlations of the errors made by the 
base models decrease, the variance of the error of the ensemble decreases and 
is less than the variance of the error of any single base model. If E a dd is the 
average additional error of the base models (beyond the Bayes error, which is 
the minimum possible error that can be obtained), Eadd * s additional error 
of an ensemble that computes the average of the base models’ outputs, and 
p is the average correlation of the errors of the base models, then Turner and 
Ghosh [14] have shown that 


pave 1 + P( M ~ !) 
&add - 


E a ddi 


where M is the number of base models in the ensemble. The effect of the 
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correlations of the errors made by the base models is made clear by this equation. 
If the base models always agree, then p — 1; therefore, the errors of the ensemble 
and the base models would be the same and the ensemble would not yield any 
improvement. If the base models’ errors are independent, then p = 0, which 
means the ensemble’s error is reduced by a factor of M relative to the base 
models’ errors. It is possible to do even better by having base models with 
slightly anti-correlated errors. If p — — then the ensemble’s error would 
be zero. 

For both data sets and classifiers, we used simple averaging ensembles. That 
is, since MLPs and RBF networks return a measure of confidence for each 
possible class, the averaging ensemble calculates, for each class, the average of 
all the networks’ confidences in that class. The class with the highest average is 
returned as the ensemble’s prediction. Though simple to apply, such ensembles 
perform remarkably well on a variety of data sets [2, 13, 14]. We experimented 
with ensembles consisting of 2 to 100 base classifiers for MLP and MLP/RBF 
ensembles, and 2 to 50 base classifiers for RBF ensembles, although performance 
improvements after 10 base classifiers were marginal. These ensembles consisted 
of random samples drawn from the 280 MLPs and 100 RBF networks that we 
created for the single-network experiments. For each size of ensemble, we drew 
20 sets of random samples and report the results as averages over these runs. 

3.4 Data Preparation 

We created data sets for each of the two aircraft by combining its 176 summary 
matrices. This resulted in 31475 patterns (revolutions) for the AH1 and 34144 
for the OH58c. Both types of classifiers were trained using a randomly-selected 
two-thirds of the data (21000 examples for the AH1, 23000 for the OH58c) 
and were tested on the remainder for the first set of experiments. For both 
aircraft, we used various subsets of the inputs. In particular, for the AH1, we 
ran experiments using only the bus data and only the vibration data as inputs, 
in addition to using all the data. 

In addition, we calculated the confusion matrix of every classifier we created. 
Entry (i 9 j) of the confusion matrix of a classifier states the number of times 
that an example of class i is classified as class j. In examining the confusion ma- 
trices of the classifiers (see table 4 for an example of a confusion matrix — entry 
(1,1) is in the upper left corner), we noticed that particular maneuvers were 
continually being confused with one another. In particular, the three hover ma- 
neuvers (8-Hover, 9-Hover Turn Left, and 10-Hover Turn Right) were frequently 
confused with one another and the two coordinated turns (11-Coordinated Turn 
Left and 12-Coordinated Turn Right) were also frequently confused (the counts 
associated with these errors are shown in boldface type in table 4.) These sets 
of maneuvers are similar enough to one another that misclassifications within 
these groups are unlikely to imply the presence of faults. Therefore, for the 
second set of experiments, we recalculated the classification accuracies allowing 
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Table 4: Sample confusion matrix for OH58c (MLP). 
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for these misclassifications. For our third set of experiments, we consolidated 
these two sets of maneuvers in the data before running the experiments. That 
is, we combined the hover maneuvers into one class and the coordinated turns 
into one class, yielding a total of 11 possible predictions instead of the original 
14. We expected the performance to be best for this third set of experiments 
because, informally, the classifiers do not have to waste resources distinguishing 
among the two sets of similar maneuvers. 

Finally, we used the knowledge that a helicopter needs some time to change 
maneuvers. That is, two sequentially close patterns are unlikely to come from 
different maneuvers. To obtain results that use this “prior” knowledge, we 
tested on sequences of revolutions by averaging the classifiers’ outputs on a 
window of examples surrounding the current one. In one set of experiments, 
we averaged over windows of size 17 (8 revolutions before the current one, the 
current one, and 8 revolutions after the current one) which corresponds to about 
three seconds. Because the initial training and test sets were randomly chosen 
from this sequence, this averaging could not be performed on the test set alone. 
Instead it was performed on the full data set for both helicopters. To allow 
meaningful comparisons of these results, we also computed the errors of the 
single-revolution classifiers on this full dataset and present them in tables 6 
and 8. 3 


4 Results 

In this section we describe the experimental results that we have obtained so 
far. We first discuss results on the OH58c helicopter. In table 5, the column 

3 We performed this windowed averaging as though the entire data were collected over a 
single flight. However, it was in fact collected in stages, meaning that there are no transitions 
between maneuvers. We show these results to demonstrate the applicability of this method to 
sequential data obtained in actual flight after training the network on “static” single revolution 
patterns. 
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Table 5: OH58c Single Revolution Test Set Results. 
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80.190 ± 0.079 

0.3687 

92.834 ± 0.065 

0.3176 

93.777 ± 0.046 

0.2905 

MLP/ 

4 

80.946 ± 0.059 

0.4352 

93.189 ± 0.042 

0.3997 

94.097 ± 0.048 

0.3788 

RBF 

10 

81.406 ± 0.043 

0.4574 

93.403 ± 0.039 

0.4273 

94.348 ± 0.025 

0.3941 


100 

81.543 ± 0.020 

0.4681 

93.463 ± 0.017 

0.4392 

94.457 ± 0.011 

0.4056 


marked “Single Rev” shows the results of running individual networks and en- 
sembles of various sizes on the summary matrices randomly split into training 
and test sets. We only present results for some of the ensembles we constructed 
due to space limitations and because the ensembles exhibited relatively small 
gains beyond N = 10 base models. MLPs and ensembles of MLPs outperform 
RBFs and ensembles of RBFs consistently. The ensembles of MLPs improve 
upon single MLPs to a greater extent than ensembles of RBF networks do upon 
single RBF networks, indicating that the MLPs are more diverse than the RBF 
networks. This is corroborated by the fourth column (marked “Corr”) 4 which 
shows that the average correlations among the base models are much higher for 
ensembles of RBF networks than ensembles of MLPs. Mixed ensembles perform 
worse than pure-MLP ensembles and better than pure-RBF ensembles for all 
numbers of base models. The correlations of the mixed ensembles are larger 
than those of the pure-MLP ensembles. This shows that RBF networks did not 
add enough diversity to make mixed ensembles outperform pure-MLP ensem- 
bles. The standard errors of the mean performances decrease with increasing 
numbers of base models as is normally the case with ensembles. The column 
marked “Post-Run Consolidated” shows the single revolution results after al- 
lowing for confusions among the hover maneuvers and among the coordinated 
turns, consolidating them into single classes (hover and coordinated turns). As 
expected, the performances improved dramatically. The column “Pre-Run Con- 

4 Each correlation in this paper is the average of the correlations of every pair of base 
classifiers in the ensemble. We calculate the correlation of a pair of classifiers as the number 
of test patterns that the two classifiers agree on but misclassify, divided by the number of 
patterns that at least one classifier misclassifies. Note that this is not the posterior-based 
correlation used in [14, 15]. A correlation of “x” in any table indicates that there were no 
training patterns misclassified by any base classifiers; therefore, the correlation is undefined. 
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Table 6: OH58c Full Data Set Results. 


Base 

Type 

N 

Window 
of 17 

Corr 

Window 17 
Post- Consolidated 

Corr 

Window 17 
Pre-Consolidated 

Corr 


1 

89.905 ± 0.121 

- 

96.579 ± 0.066 

- 



MLP 

4 

90.922 ± 0.074 

0.5014 

96.799 ± 0.026 

0.6145 


0.6258 


10 

91.128 ± 0.064 

0.5013 

96.820 ± 0.018 

0.6255 


0.6067 


100 

91.307 ± 0.015 

0.5052 

97.063 ± 0.140 

0.6290 


0.6086 


1 

82.564 ± 0.154 

- 



94.611 ± 0.124 


RBF 

4 

82.634 ± 0.059 

0.7509 

92.882 ± 0.047 

0.7755 

94.548 ± 0.063 

0.5870 


10 

82.618 ± 0.055 

0.7543 

92.895 ± 0.043 

0.7758 

94.517 ± 0.029 

0.6001 


50 

82.644 ± 0.019 

0.7505 


0.7747 

94.524 ± 0.012 

0.6072 


2 

88.674 ± 0.108 

|K 



97.155 ± 0.045 


MLP/ 

4 

88.895 ± 0.078 



1 

97.145 ± 0.067 


RBF 

10 

89.140 ± 0.057 


1 


97.226 ± 0.032 



100 

89.320 ± 0.025 




97.204 ± 0.009 


Base 

Type 

N 

Single 

Rev 

Corr 

Single Rev 
Post Consolidated 

Corr 

Single Rev 
Pre-Consolidated 

Corr 


1 

82.097 ± 0.072 


93.539 ± 0.058 

- 


- 

MLP 

4 

84.304 ± 0.049 

0.4069 

94.622 ± 0.039 

0.4019 


0.4443 


10 

84.750 ± 0.043 

0.4075 

94.805 ± 0.028 

0.4029 


0.4372 


100 

85.048 ± 0.012 

0.4081 

94.922 ± 0.011 

0.4036 


0.4355 


1 

76.406 ± 0.099 

- 

89.680 ± 0.077 

- 

90.788 ± 0.147 

- 

RBF 

4 

76.799 ± 0.040 

0.7164 

89.872 ± 0.039 

0.7142 

91.187 ± 0.045 

WMm\ 


10 

76.836 ± 0.033 

0.7186 

89.902 ± 0.027 

0.7162 

91.244 ± 0.027 

0.6157 



76.910 ± 0.011 

0.7162 

89.948 ± 0.007 

0.7143 

91.271 ± 0.013 

0.6182 


2 


BEM£1I 

SKIPS 

0.3172 

■ 

0.2883 

MLP/ 

4 


0.4293 




0.3783 

RBF 





0.4291 


0.3948 






0.4406 

wSffSCSjRfH 



solidated” shows the single revolution results on the summary matrices in which 
the hovers and coordinated turns were consolidated as described in section 3.4. 
The performances here were consistently the highest as we hypothesized. 

The top half of table 6 shows the results of performing the windowed averag- 
ing described in the previous section in the column marked “Window of 17.” The 
columns “Window 17 Post-Consolidated” and “Window 17 Pre-Consolidated” 
give the results allowing for the confusions mentioned earlier. The bottom half 
of the table gives the full set errors of the single-revolution classifiers. We can 
clearly see the benefits of windowed averaging, which serves to smooth out some 
of the noise in the data. 

Table 7 shows the results with the AH1 summary matrices randomly split 
into training and test sets. Table 8 has the windowed averaging and single- 
revolution classifier results, respectively, on the full AH1 dataset. These results 
are substantially better than the OH58c results. We expected this because 
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Table 7: AH1 Single Revolution Test Set Results. 


Base 

Type 

N 

Sing 

Re\ 

e 

Corr 

Post-Run 

Consolidated 

Corr 

Pre-Run 

Consolidated 

Corr 


1 

96.752 

± 

0.059 

- 

99.843 

± 

0.032 

- 

99.990 

± 

0.002 

- 

MLP 

4 

97.284 

± 

0.031 

0.4155 

99.975 

± 

0.010 

0.1100 

99.997 

zb 

0.001 

X 


10 

97.448 

± 

0.027 

0.4130 

99.992 

± 

0.001 

X 

99.994 

± 

0.001 

X 


100 

97.542 

± 

0.006 

0.4128 

99.995 

± 

0.001 

X 

99.992 

zb 

0.001 

— 

X 


1 

95.669 

± 

0.059 

- 

99.626 

zb 

0.017 

- 

99.695 

± 

0.011 

- 

RBF 

4 

95.946 

± 

0.029 

0.6462 

99.706 

zb 

0.010 

0.3750 

99.751 

± 

0.009 

0.4230 


10 

95.911 

zb 

0.023 

0.6561 

99.711 

zb 

0.006 

0.3824 

99.757 

± 

0.005 

0.4251 


50 

95.946 

zb 

0.009 

0.6538 

99.716 

± 

0.003 

0.3791 

99.761 

± 

0.002 

0.4215 


2 

97.040 

± 

0.054 

0.3120 

99.980 

± 

0.004 

0.0045 

99.994 

± 

0.002 

0.0049 

MLP/ 

4 

97.318 

± 

0.025 

0.3698 

99.986 

zb 

0.003 

0.0859 

99.998 

± 

0.001 

1 X 

RBF 

10 

97.429 

± 

0.018 

0.4040 

99.990 

± 

0.002 

0.1222 

99.998 

± 

0.001 

X 


100 

97.521 

± 

0.011 

0.4160 

99.998 

± 

0.001 

X 

t— * 

o 

o 

o 

o 

o 

zb 

0.000 

X 


the AH1 is a heavier helicopter, so it is less affected by conditions that tend 
to introduce noise such as wind changes. With the AHl’s summary matrices 
without consolidation, the mixed ensembles outperform the pure ensembles for 
small numbers of base models but perform worse than the MLP ensembles 
for larger numbers of base models. With consolidation, the mixed ensembles 
outperform the pure ensembles more often; however, the performances are all 
very high. Once again, we can see that ensembles of MLPs outperform single 
MLPs to a greater extent than ensembles of RBF networks outperform single 
RBF networks, so the RBFs are not as different from one another. Unlike with 
the OH58c, with the AH1, adding a few RBF networks to an MLP ensemble 
helped. The standard errors of the mean performances tend to decrease with 
increasing numbers of base models just as with the OH58c. 

On the AH1, the hover maneuvers were frequently confused just as they were 
on the OH58c, but the coordinated turns were not confused. Taking this con- 
fusion into account boosted performance significantly. The windowed averaging 
approach did not always yield improvement when allowing for the maneuver con- 
fusions, but helped when classifying across the full set of maneuvers. However, 
in all cases when windowed averaging did not help, the classifier performance 
was at least 99.6%, so there was very little room for improvement. 


5 Discussion 

In this paper, we presented an approach to fault detection that contains a sub- 
system to classify an operating aircraft into one of several states. More specifi- 
cally, the proposed subsystem determines the maneuver being performed by an 
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Table 8: AH1 Full Data Set Results. 


Base 

Type 

N 

Window 
of 17 

Corr 

Window 17 
Post- Consolidated 

Corr 

Window 17 
Pre-Consolidated 

Corr 


1 

98.344 ± 0.059 

- 

99.737 ± 0.028 

- 

100.000 ± 0.000 

- 

MLP 

4 

98.757 ± 0.031 

0.4052 

99.811 ± 0.005 

0.4111 

100.000 ± 0.000 

X 


10 

98.779 ± 0.021 

0.4105 

99.815 ± 0.002 

0.4044 

100.000 ± 0.000 

X 


100 

98.861 ± 0.006 

0.4055 

99.816 ± 0.001 

0.4127 

100.000 ± 0.000 

X 


1 

96.662 ± 0.102 

- 

I 99.404 ± 0.013 

- 

99.653 ± 0.010 

- 

RBF 

4 

96.988 zb 0.042 

0.6668 

99.431 ± 0.012 

0.6607 

99.659 ± 0.021 

0.6098 


10 

96.968 ± 0.028 

0.6764 

99.428 ± 0.008 

0.6636 

99.676 zb 0.007 

0.6049 


50 

97.003 ± 0.008 

0.6735 

99.438 ± 0.003 

0.6645 

99.696 dz 0.003 

0.6076 


2 

98.256 zb 0.064 

0.2313 

99.690 ± 0.006 

0.1915 

99.908 ± 0.003 

0.0000 

MLP/ 

4 

98.482 ± 0.034 

0.3148 

99.682 ± 0.004 

0.3085 

99.901 ± 0.013 

X 

RBF 

10 

98.475 ± 0.028 

0.3577 

99.683 ± 0.003 

0.3396 

99.918 ± 0.002 

X 


100 

98.553 ± 0.005 

0.3739 

99.687 ± 0.001 


99.920 ± 0.001 

X 

Base 

N 

Single 

Corr 

Single Rev 

Corr 

Single Rev 

Corr 

Type 


Rev 


Post-Consolidated 


Pre-Consolidated 



1 

96.933 zb 0.060 

- 

99.826 ± 0.037 

- 

99.992 zb 0.009 

- 

MLP 

4 

97.555 ± 0.025 

0.3966 

99.975 ± 0.014 

0.3795 

99.997 zb 0.007 

X 


10 

97.683 zb 0.013 

0.3973 

99.994 ± 0.009 

0.3824 

99.997 zb 0.005 

X 


100 

97.762 ± 0.008 

0.3981 

99.996 ± 0.009 

0.3836 

99.997 ± 0.001 

X 


1 

95.743 ± 0.067 

- 

99.676 ± 0.014 

- 

99.726 ± 0.012 

- 

RBF 

4 

96.063 ± 0.032 

0.6369 

99.738 =b 0.005 

0.6192 

99.767 zb 0.008 

0.4129 


10 

96.042 ± 0.026 

0.6456 

99.742 ± 0.009 

0.6272 

99.773 zb 0.009 

0.4156 


50 

96.067 ± 0.005 

0.6321 

99.747 zb 0.000 

0.6137 

99.781 zb 0.002 

0.4150 


2 

97.231 zb 0.055 

0.2933 

99.984 ± 0.000 

0.0073 

99.997 zb 0.005 

0.0025 

MLP/ 

4 

97.502 =b 0.028 

0.3539 

99.988 ± 0.005 

0.0915 

99.998 zb 0.005 

X 

RBF 

10 

97.570 ± 0.018 

0.3899 

99.993 ± 0.005 

0.1225 

99.999 ± 0.003 

X 


100 

97.659 ± 0.008 

0.3978 

99.999 ± 0.005 

X 

100.000 zb 0.000 

X 
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Table 9: AH1 Bus and Non-Bus Results 


Inputs 

Single 

Rev 

Single Rev 
Consolidated 

Window 
of 17 


All 



98.344 ± 0.059 

99.737 ± 0.028 

Bus 

90.380 ± 0.110 

95.871 ± 0.091 | 

91.209 ± 0.126 

96.027 ± 0.086 

Non-Bus 



92.913 ± 0.355 

96.110 ± 0.236 

WMMMAW 

79.523 db 0.247 

90.063 ± 0.202 ] 

85.609 ± 0.320 

93.393 ± 0.247 


aircraft as a function of vibration data and any other available data. Through 
experiments with two helicopters, we demonstrated that the subsystem is able 
to determine the maneuver being performed with good reliability. These results 
show great promise in classifying the correct maneuver with high certainty. Fu- 
ture work will involve applying this approach to “free-flight data”, where the 
maneuvers are not static or steady-state, and transitions between maneuvers 
exist. 

The results presented in this paper address the maneuver classification por- 
tion of the online fault detection system envisioned in this research and shown 
in figure 1. To address the overall detection problem, future work will involve 
experiments to determine the probabilities of agreement between different clas- 
sifiers, to detect possible faults when there is a mismatch. For example, for 
the AH1 helicopter, just the data from a 1553 bus (as described in section 2) 
were used to train some classifiers and compared to other classifiers that used 
all except the bus data. Table 9 shows the results of training 20 single MLPs on 
these data using the same network topology as for the other MLPs trained on 
all the AH1 data. They performed much worse than the single MLPs trained 
with all the inputs presented at once. The last line in the table indicates the 
percentage of maneuvers for which the two types of classifiers agreed. 

Recall from section 1 that we would like classifier disagreement to indicate 
the presence of a fault; therefore, we would like these agreement probabilities to 
be much higher. However, we hypothesize that we can use the bus data in a much 
simpler way. For example, if a vibration data-based classifier predicts that the 
aircraft is performing a forward flight, but the bus data indicate that altitude is 
zero, then the probability of a fault is high. We do not necessarily need a system 
that returns the maneuver as a function of all the variables that constitute the 
bus data. In this example, we merely need to know that a zero altitude is 
inconsistent with a forward flight. We plan to perform a detailed study of 
the collected bus data so that we may construct simple classifiers representing 
knowledge of the type just mentioned and use them to find inconsistencies such 
as what we just described. 

There is ongoing work within our research group to model aircraft engine 
operation from “first principles.” In particular, models of the gear system are 
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being prepared so that simulated data may be collected. We plan to use this 
simulation to insert cracks and other types of faults in the gear system in order 
to learn how the data changes as a function of these faults. This information 
can be used to mathematically insert faults into the real data. This gives us the 
fault data that we clearly cannot collect from the aircraft directly. We hope to 
generate such fault data and test whether our classification subsystems react to 
fault data in the way we expect. 
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